RISC OS Dynamic Linking and Shared Libraries
--------------------------------------------

A Technical View
----------------

An array of pointers is maintained which is global to the system. This is called
the global object array. Each of the pointers is to a structure that defines an
object which is currently loaded and each object stores its own index into the
global array. The first element is unused as this is always the client object.

When a client is initialised, an array, that is private to the client, is
generated. This is called the runtime array and consists of elements that
correspond directly to the elements in the global object array. Each element is
four words in size and contains the following data about the object:

word[0] - private GOT pointer
word[1] - public read/write segment pointer
word[2] - private read/write segment pointer
word[3] - read/write segment size

If the client does not use the object that the element corresponds to, then all
four words are 0. The first element in the runtime array is always the client
object.

A pointer to the runtime array is stored in the runtime workspace of the client.
This currently starts at 0x8034. As this workspace exists in the client's
application space, each client can store its own array at the same address.

Global Offset Table (GOT)
-------------------
Under Linux the GOT starts with the following header:

GOT[0] = _DYNAMIC segment pointer
GOT[1] = MODULE (linker struct) provided by dynamic loader
GOT[2] = _dl_riscos_resolve - pointer to routine to link functions at runtime

Under RISC OS, the GOT header is extended from three words to five:

GOT[0] = _DYNAMIC segment pointer
GOT[1] = Object index
GOT[2] = runtime array location
GOT[3] = MODULE (linker struct) provided by dynamic loader
GOT[4] = _dl_riscos_resolve - pointer to routine to link functions at runtime

When the dynamic loader processes the libraries the client requires, it fills
in GOT[1] with the library's index into the runtime array and GOT[2] with the
location of the runtime array (currently 0x8034). These two entries are used
by the PIC register loading code.

Loading of PIC register
-----------------------
Every function within a shared library that uses global data is required to load
the PIC register within its prologue using code similar to the following:

	LDR	r7, .L0
.LPIC0:
	ADD	r7, pc, r7			@ r7 = Library public GOT
	LDMIA	r7, {r7, r8}			@ r7 = Object index
						@ r8 = runtime array location
	LDR	r8, [r8, #0]			@ r8 = GOT table
	LDR	r7, [r8, r7, LSL#4]		@ r7 = Library private GOT

	...

.L0:
	.word	_GLOBAL_OFFSET_TABLE_-(.LPIC0+4)

This is one of the effects of using the -fPIC option.

The _GLOBAL_OFFSET_TABLE_ symbol above gives the address of the public version
of the GOT, so can't be used to access client data, however, it can be used for
retrieving the library index and runtime array location.
Note the .LPIC0+4 (rather than .LPIC0+8) is necessary in order to point to
&GOT[1] so that a multiple load can be used to fetch GOT[1] and GOT[2].

Procedure Linkage Table
-----------------------

The format of the entries in the PLT depends on whether they are part of an
executable or a library.
When lazy binding, all functions initially call the first PLT entry so that
they can be dynamically linked. In the executable there is no separate public
and private GOT; they are one in the same, ie, they have the same address. This
allows the first PLT entry to access the GOT directly in much the same way as
the ARM Linux version does, the only difference being the offset in the load
instruction:

	STR	lr, [sp, #-4]!		@ Save return address
	LDR	lr, [pc, #4]
	ADD	lr, pc, lr		@ lr = GOT
	LDR	pc, [lr, #16]!		@ Call resolver with
	&GOT[0] - .			@ r8 = &GOT[n+5], lr = &GOT[5]

In a shared library, the first PLT can take advantage of the fact that the caller
has already loaded the PIC register with the address of the client's private copy
of the GOT:

	STR	lr, [sp, #-4]!		@ Save return address
	MOV	lr, r7			@ lr = private GOT
	LDR	pc, [lr, #16]!		@ Call resolver with
					@ r8 = &GOT[n+5], lr = &GOT[5]

The format of subsequent entries of a PLT have a similar structure in both
executables and shared libraries, however, they are not the same.
Again, in an executable, the PLT entries can take advantage of the fact that the
public and private GOT have the same address allowing code similar to ARM Linux
to be used:

	ADD	r8, pc, #0x0XX00000
	ADD	r8, r8, #0x000XX000
	LDR	pc,[r8, #0x00000XXX]!

The only difference to ARM Linux is the use of register r8 instead of r12. Under
RISC OS, r12 cannot be used because the stack extension routines require that it
be preserved.

In a shared library, a slightly different format must be used:

	ADD	r8, r7, #0x0XX00000
	ADD	r8, r8, #0x000XX000
	LDR	pc,[r8, #0x00000XXX]!

In this case, the caller has already loaded the PIC register with the client's
private copy of the GOT and the immediate values applied in the above instructions
add up to the offset of the relevant function from the start of the GOT. This is
different from the exectuable case above whereby the immediate values add up to
the difference between the current PC and the location of the relevant function in
the GOT.

Memory layout
-------------
The following diagram shows the memory layout of an application wimpslot after
initialisation of a dynamically linked executable:

0x8000	---------------------------------
	| Elf program			|
	|-------------------------------|
	| Dynamic loader R/W segment	|
	|-------------------------------|
	| LD Environment string		|
	|-------------------------------|
	| argv array strings		|
	|-------------------------------|
	| Library R/W segments		|
	|  allocated by Dynamic Loader	|
	|-------------------------------|
	|				|
	| Free				|
	| Memory			|
	|				|
     sl	|-------------------------------|
	| Stack				|
     sp	|-------------------------------|
	| argc (4 bytes)		|
	|-------------------------------|
	| argv array			|
	|-------------------------------|
	| LD Environment array		|
	|-------------------------------|
	| Auxiliary data array		|
	---------------------------------

When the dynamic loader links the client to the libraries it requires, it is
necessary to make copies of the library data segments for the client's own
private use. As can be seen from the diagram, these private data segments are
allocated within application space rather than the support module's dynamic
areas making cleaning up easier. If the client wishes to load a shared library
at runtime using dlopen(), then further private data segments are allocated
using malloc().

Lee Noar.
